Skip to content

Latest commit

 

History

History
70 lines (44 loc) · 2.89 KB

text-deduplication.mdx

File metadata and controls

70 lines (44 loc) · 2.89 KB
id title subtitle breadcrumb
ai-vecs-python-client
Semantic Text Deduplication
Finding duplicate movie reviews with Supabase Vecs.
AI Quickstarts

This guide will walk you through a "Semantic Text Deduplication" example using Colab and Supabase Vecs. You'll learn how to find similar movie reviews using embeddings, and remove any that seem like duplicates. You will:

  1. Launch a Postgres database that uses pgvector to store embeddings
  2. Launch a notebook that connects to your database
  3. Load the IMDB dataset
  4. Use the sentence-transformers/all-MiniLM-L6-v2 model to create an embedding representing the semantic meaning of each review.
  5. Search for all duplicates.

Launching a notebook

Launch our semantic_text_deduplication notebook in Colab:

<a className="w-64" href="https://colab.research.google.com/github/supabase/supabase/blob/master/examples/ai/semantic_text_deduplication.ipynb"

At the top of the notebook, you'll see a button Copy to Drive. Click this button to copy the notebook to your Google Drive.

Connecting to your database

Inside the Notebook, find the cell which specifies the DB_CONNECTION. It will contain some code like this:

import vecs

DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>"

# create vector store client
vx = vecs.create_client(DB_CONNECTION)

Replace the DB_CONNECTION with your own connection string for your database. You can find the Postgres connection string in the Database Settings of your Supabase project.

SQLAlchemy requires the connection string to start with postgresql:// (instead of postgres://). Don't forget to rename this after copying the string from the dashboard.

You must use the "connection pooling" string (domain ending in *.pooler.supabase.com) with Google Colab since Colab does not support IPv6.

Stepping through the notebook

Now all that's left is to step through the notebook. You can do this by clicking the "execute" button (ctrl+enter) at the top left of each code cell. The notebook guides you through the process of creating a collection, adding data to it, and querying it.

You can view the inserted items in the Table Editor, by selecting the vecs schema from the schema dropdown.

Colab documents

Next steps

You can now start building your own applications with Vecs. Check our examples for ideas.